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1.  Introduction 

Estimation  of  combined  cross-section  time  series  models  is 
frequently  carried  out  by  error  components  methods  as  well  as  by  co- 
variance  models.   In  this  paper  we  analyze  the  small  sample  relation 
between  these  estimators.   In  addition  we  propose  a  specific  new 
estimator,  the  revised  error  components  estimator  whose  small  sample 
efficiency  is  always  greater  than  either  competing  estimator. 

In  this  work  we  rely  heavily  on  previous  studies  of  error  component 
models.  Swamy  and  Arora  [10]  developed  exact  finite  sample  results  and  an 
a- class  of  estimators.  As  one  might  expect  the  revised  error  component 
estimator  here  as  well  as  the  r-class  which  includes  it  are  special  cases 
of  their  a  class.   In  addition  readers  will  see  the  close  dependence  on 
work  of  Nerlove  [8],  Balestra  and  Nerlove  [2],  Arora  [1],  Fuller  and  Battese 
[3,  4]  and  Wallace  and  Hussain  [11]. 

We  proceed  by  restating  briefly  known  results  about  error  components 
models.  Then  we  develop  in  the  two  component  case  the  r-class  of  estimators 
and  the  specific  member  which  is  opti  ial.   Alternative  variance  estimators 
are  considered  and  their  relative  merit  determined.  Finally,  the  three 
component  model  is  developed  aad  the  results  summarized. 

2.  Model 

Consider  a  linear  econometric  model: 
(1)  y  -  6o  i  +  X6  +  e 

i  is  an  (NT  x  1)  vector  of  ones,  y  and  £  are  (NT  x  1) .  X  is 
(NT  x  k)  and  is  measured  as  deviations  from  column  means  so  that 
X'i  -  0.   $  is  (k  x  1). 
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For  simplicity  there  are  N  elements  of  the  cross  section  called 
"states"  and  T  time  elements.  Each  state-time  combination  has  exactly  one 
observation.  Within  X,  y,  and  e,  the  time  subscripts  vary  most  rapidly 
and  all  time  observations  within  a  state  are  adjacent. 

The  model  (1)  will  be  transformed  by  premultiplication  by  appropriate 
transformation  matrices  P.,  P„ ,  etc.  where  each  P  has  NT  columns.  These 
will  all  be  expressed  as  Kronecker  products:  e.g. 

P  -  A  0  B 
where  A  has  N  columns  and  operates  on  states  and  B  has  T  columns  and  operates 
on  time.  Now  consider  the  elements  of  e.  e-m  refers  to  state  i,  time  j. 

(2)  ey  -  utj  +  vj  +  ,4 

where  each  u,  v,  and  w  is  an  independent  normal  random  variable  and  is 

independent  of  X.  All  results  are  conditional  on  X  in  this  small  sample. 

2   2   2 
The  mean  of  each  component  is  zero  and  their  variances  are  0,0,0. 

u   v   w 

2 
The  normality  assumption  is  necessary  with  small  samples  to  provide  X  and 

F  distributions  but  we  show  below  the  major  results  do  not  depend  on  these 

distributions,   For  the  analysis  we  define  two  new  variances: 

2    2      ? 

(3)  o,  =  CT  +  T  cT 

1  u     v 

2  2      2 

a    -  a    +  N  a 

2    u     w 

2         2       2  2 

It  is  clear  that  a.   >_o   ,  a_  >  a   .     These  inequalities  play  a  very 

important  role  below.     Employing  (2)   and  (3) 
Ee  »  0 

(4)  V  -  Eee*  -  a2t  (I  ®  I)  +  (a2  -  a2)(l  ®  1  J)  +  (a2  -  o2)&  J8i) 

U  1  U  JL  l.  U       N 
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where  J  =  iif    IsanNxNorTxT  matrix  of  ones.     The  square  matrices  in 
(4)   are  each  idempotent. 

The  BLUE  estimator  of  3  is  Generalized  Least  Squares    (GLS)  but  this 

depends  on  the  unknown  matrix  V.      In  turn  this  matrix  depends  on  three  un- 

2       2  2 

known  elements  a   ,  a,    and  a„.      Since  V  need  only     be  known  to  a  scalar 
u       1  2 

multiple  we  define: 

(5)  Yl-oJ     h\ 

2      ,  2 
2  u  2 

Thus  we  need  only  know  y,  and  y_.  During  some  of  the  development  we  will 
consider  the  model  as  stated  (the  three  component  model).  For  simplicity 
we  also  consider  the  two  component  model  in  which: 

(6)  cl  =  0,  a22  =  *2,  y2  =  i 

In  this  situation  there  is  one  unknown  value  Yi »  which  will  be  called  y. 

3.  Useful  Transformations 

The  estimation  is  facilitated  y  use  of  various  ^ans formations. 

Consider  a  T  by  T  Helmert     matrix  <f>   [9,  p.    13].      The  first  column  is  <J>.    and 

-1/2 
the  remaining   (T-l)   columns  are  <j>?.    Then<J>.   =  T         i  and  (J>     is  orthogonal. 

Therefore, 

W   -  IT;  >'<f>  =  IT;  (f)^  =  1;   ^  =  lM 

*^1  -  |  J;   ty2  -  I  -  i  j;   ^<J,2  -  0 

Now  define  four  transformations  which  will  be  used  with  the  two  component 
model. 
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(7)  Px  =  I  9  <j>[  P3  -  I  8  t^i  -  I  8  |  J  -  P[P1 

p2  =  i  ®  <j>2  p4  =  i  e  4>2(j>;  -  i  0  (i  -  |  j)  -  p2p2 

where  P0  +  P,   ■  I;  P_  P.   -  0. 
3         4  3     4 

When  P.  and  P_  are  vertically  augmented  they  yield  the  orthogonal  matrix  I  ®  <}> 
and  so  together  provide  independent  data  sets  which  convey  all  sample  in- 
formation. 

We  also  define  (where  <J>  is  now  N  x  N) . 

p5  -  ^  •  i  -  i  j  •  i 

p6  -  *2*-  e  i  -  a  -  i  j)  e  j 

so  that 

(8)  V  -  a2  I  +  (a2  -  a2)  P3  +  (a2  -  a2)  P5 

We  will  use  this  notation  to  develop  initial  estimates  of  3  as  well 
as  estimates  of  the  unknown  variances.     We  turn  exclusively  to  the  two 
component  model  until  the  results  in  that  case  are  developed. 

(9)  V  =  a2(I  -  P,)  +  a2  P0  -  a2  P.  +  a2  p_ 

u  3  13         u4  13 

(10)     V"1  =  -2  P^  +  -2  P3  since  P3  +  P4  -  I 

a  a., 

u  1 


2  2 

4.     Estimation  of  a     and  a, 

u  1 


Transform  (1)  by  P. : 


(11)      ?±y  =  $o  Px  i  +  Px  X  6  +  ?1  e 
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(12)     V    -  EP,   eE,P1  =  a2I 


V.    Is  N  by  N  and  ordinary  least  squares    (OLS)   yields  BLUE  estimates   given 

the  transformed  data. 

Define 

(13)  Z     =  X'P'P  X  «  X'P  X 

(14)  b1  -   z"1  X'P^y 

(15)  b     -  i'P.y/N  -  Zy. ./NT  »  Grand  Mean 

o  l  lj 

(16)  var  b:  »  o^z"1 

(17)  var  b     »  a J /NT 

o         1 

2 
The  residual  vector  e     is  employed  to  estimate  O- : 

(18)  6*  -  e^/q 

(19)  q  -■  N-l-k 

*2 

Q-  follows  a  Chi  Square  distribution  ith  q  degrees  of  freedom.   Equation 

(14)  implies  that  we  assume  Z.  nonsingular  and  hence  positive  definite.   The 
analysis  can  proceed  without  this  assumption  but  that  generalization  always 
brings  unwanted  confusion.   The  reader  may  develop  results  without  it. 
Similarly  P„  may  be  employed  (and  we  assume  Z„  nonsingular). 

(20)  P2y  -  P2XB  +  P2e 

(21)  V2  =  a2  I       (NT-N)  square 

z2  =  x'p2p2x  -  x'p4x 
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(22)  b2  =  z'Vp^y 

(23)  var  t>2  =  ah^1 

2 
The  residuals  e~  are  employed  to  estimate  a   : 

(24)  a2u  =  e2e2/n 

n  =  N(T-l)   -  k 

2 
As  above  a     follows  a  chi  square  distribution. 

We  note  that  b1  and  b_  are  independent  and  include  all  the  sample 

information.  They  are  independent  of  a.  and  a     which  are  also  pair  wise 

independent.   We  will  rely  on  an  estimate  of  Y: 

(25)   9-S2a/S\ 

Y  is  clearly  distributed  F  with  (n,  q)  degrees  of  freedom. 
Before  proceeding,  consider  P,.  and  P, . 

P2P2  *  P4  P4  so  b2  =  b4*  Al8°  e4  =  P2e2  so  that: 


e4e4  ~  e2P2P2e2  "  e2(I  ®  ^2'  e2  =  e2e2 


Thus  P2  and  P,  yield  identical  ccefficient  vectors  and  variance  estimates. 

The  same  can  be  shown  to  be  true  of  P,  and  P_. 

1      3 

P.  and  P,  are  also  identical  to  the  direct  estimation  of  (1)  by 

ordinary  least  squares  after  (N-l)  state  dummy  variables  are  added.   It  is 
the  covariance  (CV)  estimate  and  we  will  refer  to  b_  by  that  name. 

For  the  development  we  employ  only  the  estimators  of  the  variance 
but  other  estimators  exist  which  are  in  some  ways  superior  and  these  are 


-7- 
discussed  below.      We  will  now  develop   the  Generalized  Least  Squares   and 
Error  Components  estimators  based  on  the  analysis  so  far. 

5.      Generalized  Least  Squares 

The  GLS  estimator  can  be  written  employing  this  notation. 

b     =    [X'V"3^]   "1    [X'V'V] 

0 


X  -  P.  X  +  P.    X 
3  4 


Employing  (10,    (13),    (22) 


x'v'1* 


X'(P3  +  V("2P4  +  V3)(P3  +  PA)X 
0u  °1 


-2x'pax  +  -2x»p3x 

v   u  1 


JT2       „2Z1 
u  1 


i  Z     +1  Z 
a2  2      a?  * 

U  1 


K  ai 


Employing   (5),    (14),   and   (23) 

(26)  bg  =    [Z2  +  YZ1]'1[X»P4y  +  yX'P^] 


(27)  b     -  W2*b2  +  W*b1 


where  W£*  -    [Z£  +  YZ^'Vp^, 


Wx*  =    [Z2  +  YZ^fV'P. 


W2     +  Wl     *  I 
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(28)  var  (bg)  -  a2  [Z£  +  YZ^*" 1 

These  are  the  well  known  results  which  depend  on  known  values  for 

2      2 
a   ,  and  a,  or  y.   (27)  shows  that  b  is  a  weighted  average  of  b„ 
u       1  g  2 

and  b  . 

6.     Revised  Error  Components 

We  introduce  a  class  of  estimators,   the  r-class,   of  which  one 
member  will  be  Revised  Error  Components    (REC) .      For  the  r  class,  y  in   (26) 
is  replaced  by  ry  where  0  <  r.      The  r  class   is  a  subset  of  the  a  class  of 
estimators  discussed  by  Swamy  and  Arora.     However,   their  purpose  was   the 
broad  proof  of  asymptotic  distribution,  etc.   and  their  multiple  parameters, 
while  desirable  for  that  purpose,   are  not  desirable  here. 

The  r  class  estimator  is: 

(29)  br  -   [Z2  +  rrzl]"1  [Z2b2  +  ryZ^] 


W_  b.  +  W.   b. 
2r  2         lr  1 


where 


W0     +  W.     =1 
2r         lr 

E(br|y)  -  w2r  Eb2  +  wlr  Eb1  =  (w2r  +  wlr)  3  =  3 

Hence  b     is  conditionally  unbiased  for  any  y  and  r  and  so  also  is 

unconditionally  unbiased.  The  variance  of  b     conditional  on  y  is 

(30)     var(br|y)  -  a2JZ2  +  ryZ^"1^  +  (r2y2/y)Z1]  [z2  +  tyZ^"1 
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Although  expressions  like  (30)  are  frequently  employed  to 
characterize  the  efficiency  of  (29),  their  use  is  incorrect  because 
Y  is  a  random  variable  estimated  in  tae  process.   The  correct  matrix 
is  the  expectation  of  (30)  taken  over  y. 

The  r  class  is  interesting  because  it  includes  other  major 
estimators  as  special  cases.  Error  components  (EC)  has  r  =  1.   Co- 
variance  (CV)  has  r  =  0  ,  GLS  has  r  «  1  and  y   =  y.      Finally  REC  which  Is 
specified  below  is  a  member  of  this  class . 

Since  every  member  of  the  r-class  is  unbiased  we  develop  REC 
by  minimizing  the  variances.   In  order  to  proceed  we  need  additional 
assumptions  about  Z.  and  Z_.   For  the  development  of  estimates  we  employ 
an  overly  strict  assumption  and  subsequently  relax  it.  Our  final  assumptions 
are  more  specific  than  we  would  desire  but  we  believe  that  other  analysis 
including  Monte  Carlo  analysis  will  show  the  particular  estimator  to  be 
more  efficient  than  alternatives  in  any  real  case.   This  is  because 
there  is  substantial  breadth  to  what  is  employable  in  our  final  assumptions. 

7.   Relation  3etween  Z-  and  Z_. 

For  this  discussion  we  consider  three  different  factors  entering 
Z-  and  Z_.   These  matrices  have  different  number  of  underlying  observations; 
those  observations  have  different  variation;  and  the  pattern  of  variation 
is  different. 

The  first  two  elements,  if  considered  without  the  third,  lead  to 
our  two  matrices  being  proportional  to  each  other.   Consider  the  number 
of  observations:   Z.  derives  from  P.  and  has  N  state  means,  while  Z„ 
(from  P.)  has  N(T-l)  deviations  from  those  means.   Hence  Z  has  (T-l) 
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times  the  observations  of  Z  and  will  be  (T-l)  times  as  large  from  this 
factor  above. 

Second,  we  observe  that  often  variation  between  states  is  larger 
(say  m  times  as  large)  as  variation  within  states,  so  that  each  observation 
making  up  Z  will  be  m  times  as  large  as  one  in  Z».   Combining  these  two 
elements  Z.  is  m/(T-l)  ■  A  times  the  size  of  Z-. 

The  third  source  of  variation  is  the  "pattern  of  variation."  Thus 
two  variables  in  X  may  be  very  collinear  within  states  (with  large  cross 
product  in  Z„),  but  unrelated  between  states.  One  very  important  reason 
for  employing  error  components  estimation  rather  than  covariance  is  the 
full  use  of  this  information  source.  We  temporarily  restrict  ourselves 
to  the  first  two  sources  of  variation  and  make  this  assumption: 

Al       Z     is  proportional  to  Z. 

(31)  Z^   =  XZ2 

8.   Optimization  of  r  and  Efficiency  Comparison 

Rather  than  looking  directly  at  the  variance  of  b  ,  we  define  the 
proportion  by  which  each  variance  ma4  rix  element  exceeds  the  corresponding 
element  in  the  variance  of  the  GLS  estimator.   Our  assumption  Al  makes  this 
relative  variance  (RV)  a  constant  scalar  for  each  matrix.  Thus  for  GLS 
under  Al: 

(32)  var(bg)  =  a*(l  +  Xy)'1  z'1 

For  the  r  class: 

(33)  var(br|y)  =  [1  +  RV(br|y)]  var(b  ) 
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(34)  RV(br|y)  -  (1  +  Xy)(l  +  Xr2y2/Y)/(1  +  Xr?)2     -  1 

-  Xyd  -  ry/Y)2/(l  +  Xry)2 


For  CV  we  set  r  =  0  and  for  EC  we  set  r  =  1: 


(35) 


(36) 


CV:   RV(b2)  -  Xy 


-2, 


EC:   RV(b  |y)  -  Xy(1  -  y/y)  /U  +  A?  ) 


Our  goal  now  is  to  minimize  (34)  with  respect  to  r.  The  resulting 
estimator  will  have  its  RV  compared  with  (35)  and  (36).  Clearly  we  remain 
interested  in  the  expected  value  of  (34)  and  (36)  over  the  distribution  of 

Examine  the  denominator  of  (34).   Employing  the  inequality  at  (3), 

2    2 
we  have  a .  _>  a     or  y  <_   1.  Because  the  true  parameter  satisfies  this 

constraint,  we  will  apply  it  also  to  the  estimate.  Thus  (25)  is  replaced 

by 


(37) 


a2/52 

u   1 


if 
if 


2  ^  ^2 
1  —  u 

a,  <  a 
1   u 


This  constraint  is  widely  employed  in  current  estimation  by  Error  Components, 
Given  this  constraint: 

(38)       l£l+XrY   <1  +  Xr 

In  addition  from  the  discussion  of  section  7  we  know  X  to  be  fairly  small 
in  general  and  below  we  set  r  <   1.   Thus  the  bounds  provided  by  the 
inequality  are  relatively  narrow.  We  note  however  that  under  relaxed 
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assumptiois  below  the  counterpart  of  X  may  be  greater  than  1. 
Employ   (38)  with   (34): 


(39) 


RV(br|y) 


<  Xyd  -  ry/Y) 


>  Xy(l  -  r?/Y)2/(l  +  Xr)2 


A  similar  pair  of  inequalities  holds  for  EC  when  we  set  r  =  1  in  (39) , 
Rather  than  minimizing  the  relative  variance  over  r,  we  will  minimize 
each  limit  as  given  by  (39).   Consider  now  the  distribution  of  Y/Y* 
Define  its  mean  and  variance: 


(40) 


y  »  E(Y/Y) 


0  0  0 

(T  +  y  =  E(y/Y) 


Then  take  the  expectation  of    (39) 


(41)  RV(br)  =  E    [RV(bjY)]  <  Xy[1  -  2ry  +  r2(a2  +  y2)  ] 


>  ay[1  -  2ru  +  r2(a2  +  y2)]/(l  +  Xr)2 


The  optimal  r  values  for  the  upper  and  lower  limits  are: 
(42)  rx  =  ]i/(a2  +  y2) 


r2  =   (y  +  X)/(a2  +  y2  +X) 

Since  we  wish  to  minimize  the  largest  possible  variance,  we  place  greatest 
weight  on  the  upper  limit  value,  r. .  Also  we  note  that  as  X  approaches  zero, 
r2  goes  to  r  and  that  r.  is  free  of  dependence  on  X. 
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Thus  REC,  the  revised  error  component  estimator  is  defined  as 
the  member  of  the  r  class  with  r  =  r. .   The  estimator  is  called  b*.   This 
value  in  principle  can  be  determined  whether  or  not  Al  is  true  and  we 
advocate  its  use  in  general.   As  previously  indicated  we  shall  below  in- 
dicate other  cases  where  it  is  optimal. 

Under  assumption  Al  we  find  these  conclusions  concerning  EC,  CV, 
and  REC: 

CI.     REC  is  more  efficient  than  CV 
C2.     REC  is  known  to  be  more  efficient  than  EC  in  this  sense: 

a.  The  upper  limit  of  the  inequality  (42)  for  REC  is  below 
the  upper  limit  for  EC. 

b.  The  lower  limit  of  the  inequality  is  below  the  lower  limit 

2    2    2 
for  EC  as  long  as  X  <  a     +  y  .   Since  y  >  1  in  most  cases,  this 

allows  a  very  large  X. 

c.  RV  (b*)  is  strictly  less  than  RV(b  )  by  (42)  if 

2    2    2 

(1  +  X)  <  a     +   y  .   We  believe  in  practice  b*  will  always  be 

superior  to  b  . 

2    2 
C3.     EC  is  more  efficient  than  CV  if  a  +  y  <  2y.   It  is  less 

2    2  2 

efficient  if  a  +  y  >  2(y  +  X)  +  X  .   Intermediate  cases  are  uncertain. 

Following  section  10  we  provide  a  specific  simple  example  so  one 

can  see  the  numerical  efficiency  gain. 

9.  Distribution  of  y. 

The  preceding  results  are  specific  and  useful  for  any  distribution 
of  y.   We  now  turn  to  the  specific  distribution  that  arises  from  our 
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estimates  (25)  and  (37).   The  distribution  stated  below  (25)  depends  on  the 
original  assumption  that  all  errors  have  normal  distribution.   In  this  case 
(Y/y)  was  F  with  n  and  q  degrees  of  freedom.   For  the  present  we  ignore  the 
restriction  that  y  <  1  but  return  to  it  later.  We  recall  that  n,  defined 
in  (24)  depends  on  NT  and  will  in  almost  every  case  be  fairly  large,  q  on 
the  other  hand  is  N-l-k  and  can  be  very  small.  For  example,  one  could 

have  N  -  7,  k  =  5  so  q  =  1.   Indeed  an  Important  reason  for  later  con- 

2 
sidering  other  estimates  for  O.   is  that  q  here  can  be  zero  so  the  estimate 

would  not  exist. 

For  this  F  distribution  we  have  (for  q  >  4) 

y  =»  q/(q-2) 

a2  +  y2  =  [q2(n  +  2)]/[(q  -  2) (q  -  4)n] 

(43)   rx  =  [(q  -  4)n]/[q(n  +  2)] 

2 
We  see  that  the  mean  of  F  becomes  infinite  when  q  =  2  and  a     does 

when  q  ■  4. 

In  (38)  we  found  that  the  fact  that  y  <_   1  was  very  useful.  It  plays 

2    2      "> 
an  extremely  important  role  here,  too.  Under  (37)  a     >  a  ,  0  £  Y  £  1  so 

0  <   y/Y  £  1/Y*   Thus  the  range  of  Y  is  finite  and  all  moments  therefore  must 

exist.   The  distribution  is  now  a  "truncated  F"  [F  (n,  q,  F*) ]  with  (n,  q) 

degrees  of  freedom  and  an  upper  limit  F*.   By  this  we  mean  the  corresponding 

F  for  values  0  <   F  <  F*.   There  is  a  "spike"  of  F*  such  that  P(F  -  F*)  = 

P(F  j>  F*) .  In  our  situation  F*  =  1/y«   Since  y  is  an  unknown  parameter  we 

cannot  know  F*  or  the  resulting  mean  or  variance  of  F  but  can  examine 

them  conditional  on  various  F*. 
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The  moments  of  F     (n,  q,   F*)  were  found  by  numerical  integration 

and  appear  in  Table  1.      n  -*■  °°  was  emp'oyed  since  in  general  use,   n  will  be 

2  2 

large.     The  truncation  reduces  y  and   (a     +  u  )   and  also  increases  r.      It 

is  clear  that  a  prior  point  estimate  or  distribution  of  y  and  hence  F*  is 

required  to  select  the  most  efficient  estimator  and  the  reader  may  consider 

the  optimal  use  of  such  prior  information.     We  propose  that  a  broad  rule 

of  thumb  be  employed  and  will  show  where   it  is  better  than  alternatives. 

The  rule  is 

(44)  r  -      [(q  +  4)nJ/[(q  +  11)  (n  +  2)]  if  q  <  15 

[(q  -  4)n]/[q(n  +2)]  if  q  >   15 

This  rule  was  devised  as  a  simple  way  to  approximate  Table  1  when  F*  ■  3 

and  to  include  the  effect  of  n.      It  may  be  shown  that  it  leads  to  a 

smaller  relative  variance  than  EC  for  the  values  in  the  Table  as   long 

as  either  F*  _>  5  or  q  _>  4.      REC   is  now  defined  as  the  member  of  the  r-class 

with  r  defined  as  in   (44).      Two  conclusions  can  be  derived. 

C4.  REC  is  superior  to  either  EC  c  :  CV  if  F*  <_  5   or  q  >_  4. 

If   F*    >  5  and  q  jc  3  then  the  r-class  should  be  employed  with  appropriate 

(small)   value  of  r. 

C5.     EC  is  superior  to  CV  if:   F*  ■  2  for  any  q;  F*  -  3  and  q  ^  4;  F*  ■  5, 

and  q  _>  7;  F*  >  5  and  q  >_  8.   This  is  found  by  direct  application  of  C3  to 

Table  1.   Clearly  the  small  sample  properties  are  very  dependent  on  the 

true  unknown  y.  Monte  Carlo  studies  should  reach  various  conclusions 

according  to  the  variances  selected  in  advance  for  analysis. 

It  is  important  to  note  that  in  Table  1,  the  biggest  effect  on  r 
is  created  by  F*  and  not  the  degrees  of  freedom.   This  implies  intuitively 


Table  1 

Truncated  F  Distribution 
Mean,  Variance,  Optimal  r 
Numerator  Degrees  of  Freedom  -i 


Denominator 
Degrees  of 
Freedom,  q 


Truncation  s 

=  F* 

2. 

3. 

5. 

1.44 

1.92 

2.69 

2.54 

4.90 

11.01 

.57 

.39 

.24 

1.35 

1.68 

2.13 

2.23 

3.87 

7.41 

.60 

.43 

.29 

1.30 

1.55 

1.83 

2.05 

3.29 

5.52 

.63 

.47 

.33 

1.26 

1.46 

1.65 

1.93 

2.89 

4.36 

.65 

.51 

.38 

1.24 

1.39 

1.52 

1.84 

2.60 

3.59 

.67 

.53 

.42 

1.22 

1.34 

1.43 

1.77 

2.38 

3.05 

.69 

.56 

.47 

1.20 

1.30 

1.37 

1.71 

2.20 

2.67 

.70 

.59 

.51 

1.19 

1.27 

1.32 

1.65 

2.06 

2.39 

.72 

.62 

.55 

10. 


1.  y 


G     +  ]1 


2.         y 


2  2 

cr    +  y 


3.         y 


2         2 
a    +  y 


4.  y 

2         2 

a    +  y 


5.         y 

~2  2 


6.  y 

2         2 

a    +  y 


7.  y 

„2  ,      2 

a    +  y 


8.  y 

„2  JL     2 

a    +  y 


4.14 

32.35 

.13 

2.78 

16.82 

.16 

2.16 

10.10 

.21 

1.82 

6.72 

.27 

1.62 

4.87 

.33 

1.49 

3.78 

.39 

1.40 
3.10 

.45 

1.33 

2.66 

.50 


* 
* 
0 

* 
* 
0 

3.00 
* 

0 
2.00 

0 

1.67 

8.33 

.20 

1.50 

4.50 

.33 

1.40 

3.27 

.42 

1.33 

2.67 

.50 


//  2         2, 
r  -  \i/(o     +  y  ) 


infinite 
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that  the  results  will  not  materially  be  altered  if  the  underlying  distribution 
is  different  from  F  as  long  as  the  truncation  is  present. 

We  have  at  this  point  developed  the  REC  estimator  and  have  shown  it 
to  be  broadly  superior  to  EC  and  to  CV.   It  remains  to  expand  the  analysis 

in  three  ways.   These  are  first  to  relax  the  initial  assumption  Al;  second 

2      2 
to  consider  other  estimators  of  a,  and  O   ;  and  third  to  present  the  estimators 

1      u 

for  a  three  component  model. 

10.  Alternative  Assumptions  about  Z. 

The  analysis  above  depends  on  Al  and  we  now  discuss  a  variety  of 
means  by  which  comparable  results  may  be  achieved.  While  the  family 
ultimately  outlined  here  is  not  all  inclusive,  it  contains  a  wide  family 
of  different  kinds  of  situations.   Initially  we  will  state  the  major 
alternative  assumption  and  then,  since  its  meaning  may  not  be  instantly 
clear  we  will  consider  specific  situations  under  which  it  will  arise. 

For  the  matrices  Z.  or  Z_  we  may  write  eigen  vectors  and  eigen 
values  in  a  matrix  equation;  e.g. 

Z2A2  *  A2A2 
where  A-  Is  an  orthogonal  matrix  of  eigen  vectors  and  A«  a  diagonal  matrix 
of  eigen  values.  k~   ^y  not  ^e  unique  because  its  columns  may  be  permuted 
along  with  elements  of  A„  and  if  there  are  multiple  eigen  values,  there 
are  multiple  solutions.  We  assume  the  following: 

A2      For  the  matrices  Z_  and  Z  there  exist  matrices  of  eigen  vectors 
A-  and  A.  such  that  A  =  A  =  A.   It  is  permissible  to  perform  a  single 
variance  transformation  on  both  Z  matrices  before  finding  eigen  vectors. 
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One  way  of  achieving  this  is  assumption  Al  above.   Another  direct 
way  is  the  following: 

A3      Z.  and  Z„  are  diagonal  matrices.   This  would  arise  when  all  the 
Xs  are  uncorrelated  both  within  states  and  between  states.   Incidentally, 
this  clearly  includes  the  case  where  there  is  a  single  regressor. 

Other  possibilities  are  best  approached  by  first  performing  a 
variance  transformation  on  Z.  and  Z~  as  mentioned  in  the  assumption.  Form 
the  diagonal  matrix  Q  such  that  each  diagonal  element  is  the  square  root 
of  the  reciprocal  of  the  diagonal  of  Z_,  i.e. 


(45)       Q1±  =  Za 


-1/2 


q±j  -  o  tyj 


Define 


(46)       Z*  »  QZ2Q 


Z*  -  QZ^Q 


It  is  clear  that  by  construction,  the  diagonal  of  Z*  consists  of  ones. 
We  assume: 

A4      The  diagonal  elements  of  Z*  are  equal  to  each  other.   The  im- 
plication of  this  is  that  variances  in  Z  are  proportional  to  those  in 
Zj.     This  transformation  will  be  made  before  A2  is  applied.  Its  use, 
as  seen  below,  is  to  transform  both  X  and  6  without  change  of  the  content 
of  the  analysis.   Along  with  A4,  anyone  of  the  following  is  sufficient  for 
A2. 
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A5      There  are  two  explanatory  variables  In  the  model  (k  »  2) . 

A6      Z*  and  Z*  have  a  single  constant  every  where  off  the  main  diagonal 

(but  these  two  may  be  different) . 

A7      Z*  and  Zt  are.   each  circular  symmetric  matrices. 

A8     Z*  and  Z*  are  each  tridiagonal  matrices. 

See  Press  [9]  for  specific  discussion  of  the  latter  two  assumptions. 
Another  possibility  is  that  Z.  and  Z.  are  each  conformably  block  diagonal 
with  separate  blocks  satisfying  different  assumptions  above.   In  general 
terms  these  assumptions  say  that  one  makes  a  single  variance  transformation 
to  Z  and  Z  and  the  results  are  pattern  matrices  of  a  single  type  but 
different  coefficients. 

Now  we  proceed  to  assume  A4  and  A2  are  consecutively  employed. 

(47)  Z*  A  -  (QZ2Q)  A  -  AA2 
Z*  A  «=  (QZjQ)  A  -  kh 

Define  M  as   the  reciprocal  square  root  of  A?  so  that: 

(48)  M2  A2  =  I 
Reconsider  the  original  model   (1) . 

(49)  Y  =  3i  +  X3  +  e 

o 

=•   $  i  +  XQAM(QAM)""1  g  +  £ 
o 

Define 

(50)  6  -   (QAM)"1^  so  that 


(51)  Y  »  $  i  +  (XQAM)6  +  e 
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We  will  now  examine  the  variance  matrix  of  the  transformed  coefficient 
vector  <5.  This  will  be  minimized  in  the  sense  that  alternative  estimators 
will  have  variance  matrices  which  dit :er  by  a  positive  semi-definite 
matrix  indicating  the  results  will  also  hold  for  estimates  of  0. 

The  transformation  QAM  alters  the  matrices  Z.  and  Z_.   These 
become: 

(52)  Z**  -  M'A'Q'Z  QAM  =  I 

Z**  =  M'A'Q'Z^QAM  -  MA^  -  K  A'1   -  A 

A  is  a  diagonal  matrix  where  the  i  th  diagonal  element  is  the 
ratio  of  the  eigen  value  of  Z_  to  the  corresponding  eigen  value  of  Z?. 

(53)  Xt  -  XUA21 

We  proceed  directly  to  the  variance  of  b      (30)  where  b     is  now 
an  estimate  of  6. 

(54)  var(br|y)  -  a*  [I  +  r?A]"1[I  +  (r2y2/Y)A][I  +  ryA]""1 

Clearly  all  off  diagonal  elements    (covariances)   are  zero  and 
diagonal  elements  may  be  written: 

(55)  var(bri|y)  =  a2[l  +  X^f^l  +  (v2f/y)X±) 

Previously  this  depended  on  a  scalar  X  for  the  entire  matrix 
but  now  it  depends  on  A..     Relative  variance  continues  to  be  as  in   (33) 
but  is  unique  for  each  parameter  in  6*. 

2,,,        ,      *.2 


(56)  RV(brl|y)  =  XlY  (1-ry/Y)   /(l  +  X±ry)' 
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This  result  is  identical  to  (34)  except  that  the  subscript  is 
added  to  X.  Our  justification  for  choosing  the  upper  limit  in  (42)  to 
determine  r  is  strengthened  because  we  are  free  of  dependence  on  X. 
All  subsequent  results  hold  in  their  entirety  but  now  apply  to  each 
parameter  being  estimated.  CI  can  be  restated  as  follows: 

The  variance  of  the  CV  estimator  of  6  exceeds  that  of  the  REC 
estimator  by  a  positive  semi-definite  matrix.   Since  this  difference  is 
true  for  estimator  of  <5,  it  is  also  true  of  the  comparable  estimators  of 
B.  Each  other  conclusion  has  a  comparable  interpretation  showing  in 
particular  REC  to  be  better  than  the  alternatives.  We  should  note  that 
in  some  cases  X  may  be  fairly  large  leading  to  wide  inequalities  in  (42) 
but  also  leading  larger  relative  efficiency  differences. 

An  example  is  presented  to  make  the  potential  efficiency  gain 
more  real.   This  example  falls  under  A4  and  A5. 


X*P  X  »  Z£  » 


16  -36 
-36  100 


x'p3x  -  z1  - 


4  8 
8  25 


) 


The  only  restriction  is  that  variances  are  proportional:  i.e.  16/100  ■  4/25 
The  example  was  selected  so  the  pattern  of  variation  differs  substantially, 
that  is,  the  cross  products  are  large  negative  and  positive  numbers 
(-36  and  +8).  The  variance  transformation  is  performed  and  then  eigen 
vectors  found: 


.25   0 
0  .10 


1_ 


\ 


i  l 
l  -l 
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The  eigen  values  are: 


X2±   -  [  .1  1.9]     A1±  =  [.45   .05] 


A   =  4.5   A2  =  .02631 


Assume  also  y   -  1/3    q  =  5    n  =  98 

Then  by  (44)  r  =  .551 

For  &1  RV(CV)  =1.5 

RV(EC)  <  1.5  (.82)  -  1.23 
RV(REC)  <  1.5  (.26)  =  .39 

For  62  RV(CV)  -  .0088 
RV(EC)  _<  .0072 
RV(REC)  <  .0022 

The  figures  indicate  that  there  is  little  difference  in  the  three 
estimators  relative  to  6?  but  substantial  difference  in  6  .  For  <$.. ,  CV  has 
variance  that  is  150%  larger  than  GLS  ith  known  variance  matrix  while 
EC  has  variance  up  to  123%  larger  and  REC  up  to  39%  larger.   Thus  the 
variance  of  CV  is  12%  greater  than  that  of  EC  and  80%  greater  than  REC. 
For  6_,  the  ordering  is  the  same  but  REC  is  only  known  to  be  0.66% 
superior  to  CV. 

This  concludes  the  discussion  of  alternative  assumptions  about  Z. 
We  turn  in  the  next  section  to  alternative  variance  estimates. 
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11.   Alternative  Variance  Estimates. 

2      2 
The  estimates  above  of  a.  and  a      (18)  and  (23)  are  not  the  only 

such  estimates.  Arora  [1]  employs  those  given  here.  Fuller  and  Battese 

*2  2 

[3,  4]  use  a  but  for  a.  employs  the  estimator  below,  while  Wallace  and 

Hussain  [11]  use  estimates  which  are  each  different.  We  will  consider  the 

merits  of  the  major  alternative  forms. 

A2     ^2 
Recall  that  a.,  and  a  are  advantageous  because  they  are  each  chi 

square  with  q  and  n  degrees  of  freedom  respectively.  Their  disadvantage 

which  we  will  show,  is  that  they  "lose  "  more  degrees  of  freedom  than  the 

model  requires  and  hence  are,  in  certain  cases,  inefficient.   This  is  of 

importance  particularly  when  there  are  few  (or  zero)  degrees  of  freedom. 

In  the  following  we  need  some  known  results  concerning  quadratic  forms. 

2 

1.  Assume  X  is  N(0,a  I)  and  A  is  idempotent  of  rank  r. 

Then 

Q  -  X'AX  ^  Chi  square  (r) 

Its  cumulants  are  [6,  p  168] 

(57)       K8(Q)  -  Cs  •  r 

where  Cg  -  2s""1  (s-1)! 

The  cumulants  convey  the  same  information  as  moments   [5,  p. 20] 
The  first  four  are: 

Kl  =  wl;  K2  =  °   ;  K3  "  y3;  K4  =  P4  "  ° 

Where  y,.  and  y.  are  the  third  and  fourth  moments  about  the  mean. 
3      4 

All  cumulants  for  s  >  3  in  a  normal  distribution  are  zero. 

2.  Assume  X^N(0,V)  and  A  is  a  general  symmetric  matrix.  Then 
Q  ■  x'Ax  has  the  following  cumulants  [7,  p.  153]. 
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(58)       Kg(Q)  =  CsEX^  =  Cs  tr  (AV)S 


where  the  X..  are  the  eigen  values  of  AV  and  tr  (AV)  is  its  trace. 
3.   Assume X^N(0,V)  and  A  and  B  are  general  quadratic  forms.   Then 


covar(X'AX,  X'BX)  -  2  tr  (AVBV) . 

This  is  obvious  from  the  variance  of  X' (A  +  B)X. 

Now  we  will  examine  variance  estimates  of  the  type  employed  by 
Wallace  and  Hussain  [11].  First  employ  ordinary  least  squares  to  estimate 
(1)  and  find  the  residual  vector  e. 

e  -  Me  =  [I  -  (-J  8  h)   ~  X(X,X)"1X]e 
n    t 


Define 


Q°  •  e'P3e  =  e'MP3Me 


Q?  -  e'P.e  -  e'MP.Me 
4      4        4 


The  distribution  of  Q.  and  Q,  depend  on 
Q3  -  P3MVMP3 

Q4  =  P4MVMP4 

Where  these  are  symmetric  matrices  which  serve  the  role  of  AV  in  (58) 
First  examine  Q,: 

Q4  -  ofo  -  K°)  +  (aj  -  *y°3l> 
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where       K°  *  PiX(X,X)"1XfP1 


K±  -  (X,X)~LX,PiX 


k.  ■  tr  K  "  tr  K° 
ill 


K . .  =  K.K      etc.  for  any  sequence  of  subscripts 

r  s 
k.      is  k  with  r  subscripts  each  equal  to  i  followed  by 

^    s  each  equal  to  j. 
K=  (Z.  +  Z_)   Z.    and  is  non  negative  definite.  Hence 


ki  1  ° 


k.  +  k.  =  k 
3    A 


k   +  k»,  -  k_    etc.  for  any  complementary  pair. 


Employing  eigen  values  one  may  show 

k33  >  k32/k  . 


k34  i  k3Vk 


We  may  proceed  further  under  the  assumptions  employed  above. 
Assume  Al: 

k3r4P  -  ur(l  -  y)pk  p  =  X/(l  +  X) 

Assume  A2: 

k3r4P  =  Zy±r(l  -  Pi)p  U±  -  X±/(l  +  X±) 
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2 
12.   Estimate  of  0   . 
u 

Define 

(59)  a*2  =  Q.%11, 

u    4   4 

(60)  m4  -  N(T  -  1)  -  k4 

(61)  Ea*2  =  a2(l  +  yk34/m4) 

(62)  y  -  (ax2/a2  )  -  1 

(63)  var  a*2  =  2a  4  —  [1  -  —  {k_.  -  2yk- . ,  -  yV,,,}] 

u     urn,      m.   34      343      3433 
4       4 

(61)  and  (63)  show  the  dependence  of  this  estimators  on  y  and  therefore  on 

2     2 

Q,/<7   .      Hence  we  must   assume  y  is  small.      Given  this  and  the  small  size  of 
1     u 

2 
k_,  we  note  that  a*     is  approximately  unbiased  and  the  last  term  in  the 

variance  equation  is  1.      If  y  is  small  the  cumulants   are: 

(64)  Ks(o*2u)  -  csa2us  ,41-3 [1  -  J >(k  -  k3s)  +  o(k34/m4)  1 

4 

The  last  term  signifies  terms  of  order  at  most  k_,/m, . 
Deleting  that  term 

(65)  k  (a*2)  <  C  a  ^m,1"3 

s   u  —  s  u   4 

which  are  the  cumulants  of  chi  square  with  m,  degrees  of  freedom.  Hence 

2 

a*  is  superior,  in  the  sense  of  cumulants,  to  such  a  distribution  and 

hence  also  to  a  . 
u 

The  smallness  of  y  can  be  made  more  specific  by  employing  A2. 

In  that   case 

k 

(66)  Ks(a*2)  =  Csa2sm41_s[l  +  ^-{-k  +  Zy^d  +  y  -  yy^3}] 

4  i 
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2 
This  is  superior  to  x  (mA)  if  the  term  in  {   }  is  negative  for  all  s. 

In  turn  this  is  true  if  each  element  Ln  the  sum  is  less  than  or  equal  to 

1  or: 

1  >  pj(l  +  y  -  yu±)S 

y  <  1/U± 

This  is  also  a  necessary  condition.   In  our  example  above 
X  =  (4.5,  .026)  so  y  ■  (.82,  .075) 

thus 

y  <  1.22   or   a  2  <  2.22  a2   or  y   >  .45. 
—  1  —      u         — 

2     2 
In  every  case  under  A2,  O.      <  2a  will  suffice  so  that  all  cumulants  of 

2  ~2 

a*     are  less  than  those  of  a  . 
u  u 

In  many  cases  under  Al  or  A2,  the  largest  X   will  be,  say  1/10. 

2 
In  this  case  u  ■  1/11  and  y  <  11  is  sufficient  for  a*     to  be  superior  to 

—  u 

/s2  2 

a   .   Therefore,  we  conclude  that  O*     is  the  superior  estimator  in  cumulant 
u  u         r 

2  2 
if  either  0~/o     is  small  or  the  largest  root  A  is  small. 
1  u  ° 

2 
Despite  the  advantages  of  a*  ,  the  gain  will  in  almost  every  case 

be  small  because  there  are  many  degrees  of  freedom.   Hence  it  seems  wise 

"2 

to  generally  employ  a   . 

2 
13.   Estimates  of  a.  . 

Q3  =  a/    [(I  -  $D  8  1  J  -   2K3°  +  K330]   +  o\   [K3°  -  K3°] 
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Define 

(67)  a*2  -   [Q3  -  a\3A]/m3 

(68)  m3  =  N  -   1  -  k  +  k44 

This  is  the  estimator  of  Fuller  and  Battese.   For  simplicity  one 
could  employ  Q  /(N-l)  or  Q  /nu  but  these  require  assumptions  that  a  are 

S  S   J  u 

small,  etc.   Cumulants  can  be  found  generally: 

(69)  K8(CX*2)  -  C^m/^U  -  ^-(k4A  -  k/8)  +  0(Yk34/m3)3. 

The  last  term  depends  on  m,.  which  may  not  be  very  small  but  also  has 

2  2 
Y  "  O  /o.   which  is  always  less  than  one.   (69)  includes  the  effect  of 

*2 

a  .   To  make  (69)  clearer  we  now  employ  A2.  Under  it: 

(70)  <S^J2) 

-  csa2sm31_s[i  -  ±  z  (i  -  p±)2{i  -  (i  -  y±)s"2(i  -  v±  +  Yy±)8} 
+  ±  (NrT  -  i)  -  k)"s(-Yk3A)8] 

(71)  k  (a*2)  <  CoJV1"8. 

8   1    —   S  1   J 

The  inequality  in  (71)  is  true  for  s  >_   3  without  any  additional  assumptions. 
When  s  ■  2,  the  requirement  is  that  Y  be  a  small  amount  less  than  one. 
One  form  of  requirement  for  (71)  to  hold  is: 

2[N(T  -  1)  -  k]  >  k/(l  -  Y)1/2 

For  example  if  Y  ■  .99  and  k  =  4,  then  (71)  is  true  if  N(T  -1)  >  24. 
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This  depends  on  the  fact  that  at  least  one  p.  <_  1/2.   Our  real  Interest 

A  2 

Is  var  (a.  )  and  one  can  show 

var  a*  £  var  (a-  ) 

If  N(T  -  1)  -  k  >  1/4  k 

which  will  of  course  always  be  true. 

2 
We  have  shown  that  o*     is  cumulant  superior  to  chi  square  m_  as 

A  2 
well  as  to  a.   and  hence  should  always  be  employed.   Clearly  there  will  be 

2 
some  cases  with  small  N  where  a*  exists  and  the  alternative  does  not. 

We  will  employ  the  chi  square  distribution  as  an  approximation 

as  we  return  to  REC  but  we  note  that  the  truncation  of  the  F  distribution 

means  the  exact  distribution  is  not  required  in  any  event.  Thus  we  propose 

that  REC  be  employed  as  above  but  with  y*   -  a  /a*  replacing  Y*  The  new 

degrees  of  freedom  are  used  in  defining  r.   (m,  replaced  q.) 

*  2 
In  this  section  we  have  shown  that  the  estimates  of  a.,   of  Fuller 

and  Battese  are  superior  to  those  of  Arora  but  that  both  serve  well  in  the 

REC  estimator. 


14.   Three  Component  REC  Model 

Now  consider  the  original  model  with  variance  matrix  given  by  (4) . 

2     2 
The  analysis  was  developed  with  CT-  ■  a       but  that  assumption  is  now 

dropped  and  the  third  component  restored.  We  employ  four  transformations 

defined  as  in  section  3.  These  are 

(72)       P=P«P         P=»P«P 
K     J  35    3    5         36   r3    6 

P,,  =  P.  •  P.        P.,  =  P.  •  P, 
45    4    5         46    4    6 
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One  may  show 

P34  +  P45  "  P5:  P36  +  P46  =  6;  P35  +  P36  "  P3;  etc' 
When  the  model  is  transformed  by  V   -   we  have  the  grand  mean  for  all 


ob 


servations  -  which  is  zero  within  X.  P,_  yields  mean  values  for  each 
time  period;  P,fi  yields  mean  values  for  each  state  and  P.,  yields 
deviations  from  both  the  state  mean  and  time  mean. 

When  regressions  are  performed  on  each  transformed  model  in  turn: 
P^c  yields  only  the  estimate  of  6 

P,6  yields  b,  and  a  new  $     which  has  [(N  -  1)  (T  -  1)  -  k] 

degrees  of  freedom. 

~2  ^2 

P_-  yields  h.  and  a.  which  are  identical  to  b,  and  cr-  above. 

P,,.  yields  b,.  and  cL  where  the  latter  is  Chi  Square  with  T  -  1  -  k 

degrees  of  freedom. 

2      2 
We  also  could  define  0*     and  a*  as  was  done  in  the  preceeding 

section  and  which  would  have  the  same  implications.   All  of  the  estimators 

under  consideration  are  now  weighted  averages  of  b. ,  b,,  and  b_: 
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Define  the  following. 

'1    u  1  2    u  2 

Z.  -  X'P,PCX         Zc  -  X'P.P,X 
4      4  5  5      4  5 

Z.  =  X'P0P,X  -  X'P-X 
1      Jo       J 

ZA  +  Z5  -  Z2  ' 

W  •  Z,  +  r  Y-.Z.  +  r,  Yozc 

0  4    a'l  1    b'2  5 

W,  =  W  z. 

4  o   4 

W  =  r  Y,W  "*1Z1 

1  a  1  o   1 

W_  =>  r,  Y0W  ~1Zc 

5  b  2  o   5 

Then  the  r  class  estimator  is  defined  as: 

(73)  b  -  W.b,  +  W.b,  +  W^b,. 

r    44    11    55 

This  estimator  is   again  identical  to   the  a  class   of  Swamy  and 
Arora.      Indeed  the  entire   7  parameter  a  class  can  be  expressed  as  part  of 
the  two  parameter  r  class  if  we  set: 

(74)  rfl  =  a0/ta6^ai  +  a2^1^ 
rfe  -  a3/[a6(a4  +  a^)] 

Although  the  a  notation  was  desirable  for  the  purposes  of  that 
article,    the   r  notation  is   simpler  and  more  sensible  for  general  estimation 
purposes. 
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There  are  many  specific  estimators  available  for  a  3  component 
model  such  as  this  one.   For  each  of  the  two  ratios  Y-t  and  Yn  we  ^y  employ 
EC,  CV,  REC,  or  omit  the  component  and  return  to  a  2  component  model.   These 
imply  that  r  ■  1,  r  =  0,  r  *  r*,  or  r  Y,  s  1.   Since  the  same  four  options 

S.  3,  3.  3.    _L 

are  available  for  the  two  components   there  are  4  x  4  =  16  different  specifi- 
cations clearly  available. 

The  variance   of  b     is : 

r 

(75)  varog^,  Y2>  -  °\~hzA  +  Cr.VV*!  +  '^Vo"1 

We  now  proceed  directly  to  an  assumption  about  Z  similar  to  that 
made  above: 

A9      After  a  single  variance  transformation  Q  is  applied  to  Z_,  Z-  and  Z, 
such  that  Z*  -  QZ.Q  etc.  there  exist  eigen  value  matrices  for  the  transformed 
Z  that  are  identical. 

This  is  applied  as  above  to  transform  the  parameter  vector  to  a  new 

vector  <$.   The  eigen  values  are  contained  in  A..  A,  and  A_. 

4       15 

Define 

Xai  =  Xli/X4ij  \±  "  X5i/A4i 

Then  the  relative  variance   of  b   J   may  be  shown   to  be: 

ri 

(76)  WCbJV  Y2)  -  Q*/(l  +  X^  ♦  ^rjff 

<     Q* 

Q*  -  Xalyl(1  -  raYl/Yl)2  +  ^(1  -  raY2/Y2)2 
+  XaAiYlVraVYl  "  rbW2 
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9  9 

y  and  a.  are  the  mean  and  variance  of  (Y-j/Y-j);  P?  and  a?  are  the  mean  and 

*  ^2  A2     ^2 

variance  of  (y~/y?)'>   and  a  ?  is  the  rovariance.   If  a  ,  a  and  a2  are 

o  ^ 

respectively  chi  square  with  n  ,  q  ,  and  q?  degrees  of  freedom  then  Yi  and 

Y0  are  each  F  with  common  numerator  but  independent  denominator.   If 

these  distributions  are  not  truncated  then: 

(77)      a12  =  (2/n°)  u^ 

Ihis  value  will  be  small  when  n  is  large. 

We  minimize  Q*  over  r  and  r,  .   The  optimum  value  for  r*  after  some 
x       a     b  a 


computation  is  found  to  be: 

*biY2 


(78)       r*  =  r,  li- 


st.        1       1  + 


\±^: 


1  -  r*  (y2  +  a12/Ul) 


r  is  the  optimal  r  value  for  the  two  component  model  defined  in  (42) . 

A  similar  equation  holds  for  r*.   It  is  seen  from  this  that  r*  is  approximated 

by  r1  if  (X,  .  Y2)  is  small  or  if  q„  is  not  small  (so  that  r*  is  approximately 

one.   Because  this  approximation  exists  we  define  the  three  component  REC 

estimator  as  the  r  class  with  r  =  r  and  r.  ■  r„. 

a    i      b    2 

It  remains  for  us  to  contrast  the  efficiency  of  the  alternative 
estimators  with  the  three  component  model.   We  indicated  below  (74)  the 
variety  of  estimation  methods  available.   The  selection  of  estimation 
method  depends  on  the  true  value  of  Y-i  or  Y?«   Hence  the  most  efficient 
method  is  2  components  if  Y  is  close  to  one.   CV  is  adviseable  if  q  is  very 
small  or  X  large  and  EC  is  good  if  q  is  large  and  Y  not  close  to  one. 

The  most  efficient  estimator  In  every  case  is  REC  with  appropriately 
chosen  r.   In  general  use  one  may  choose  r  by  the  rules  given  in  (42)  but 
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shift  to  another  estimator  is  special  cases  indicated  by  the  previous 
paragraph.   The  general  use  of  any  other  estimator  is  clearly  inefficient. 
We  have  developed  the  r  class  of  estimators  and  have  shown  that  a 
specific  member,  REC,  should  be  used  in  essentially  all  error  component 
problems.   In  the  process  we  have  developed  substantial  information  about 
small  sample  properties  of  error  components,  covariance,  and  REC  estimators, 
Further  theoretical  and  Monte  Carlo  analysis  should  broaden  the  specific 
assumptions  and  results. 
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